Active Learning for Unbalanced Data in the Challenge with Multiple Models and Biasing

نویسندگان

  • Y. Chen
  • S. Mani
چکیده

The common uncertain sampling approach searches for the most uncertain samples closest to the decision boundary for a classification task. However, we might fail to find the uncertain samples when we have a poor probabilistic model. In this work, we develop an active learning strategy called “Uncertainty Sampling with Biasing Consensus” (USBC) which predicts the unbalanced data by multi-model committee and ranks the informativeness of samples by uncertainty sampling with higher weight on the minority class. For prediction, we use Random Forests based multiple models that generate the consensus posterior probability for each sample as part of USBC. To further improve the initial performance in active learning, we also use a semi-supervised learning model that self labels predicted negative samples without querying. For more stable initial performance, we use a filter to avoid querying samples with high variance. We also introduce batch size validation to find the optimal initial batch size for querying samples in active learning.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Learning and Memory in Morphine Dependent Rats using Different Behavioral Models

There are several conflicting evidences showing the effect of morphine on learning and memory processes. In the present study the effect of chronic morphine administration on passive avoidance, active avoidance and spatial learning and memory of morphine dependent male rats using Passive Avoidance shuttle box and Morris Water Maze tasks were investigated, respectively. Male rats received morphi...

متن کامل

Inquiry-Based Learning: A Model for Improving the Humanities in Iranian Higher education

The humanities in Iran face major challenges: the challenge of graduate unemployment, the challenge of localization, the challenge of the reason for attending classroom and the challenge of dissertations and articles without identifying and structuring real problems. To resolve these challenges, different approaches have been presented by the theorists. This article analyzes the problematic sta...

متن کامل

Spatiotemporal Estimation of PM2.5 Concentration Using Remotely Sensed Data, Machine Learning, and Optimization Algorithms

PM 2.5 (particles <2.5 μm in aerodynamic diameter) can be measured by ground station data in urban areas, but the number of these stations and their geographical coverage is limited. Therefore, these data are not adequate for calculating concentrations of Pm2.5 over a large urban area. This study aims to use Aerosol Optical Depth (AOD) satellite images and meteorological data from 2014 to 2017 ...

متن کامل

Detecting Concept Drift in Data Stream Using Semi-Supervised Classification

Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...

متن کامل

Comparison of Learning and Memory in Morphine Dependent Rats using Different Behavioral Models

There are several conflicting evidences showing the effect of morphine on learning and memory processes. In the present study the effect of chronic morphine administration on passive avoidance, active avoidance and spatial learning and memory of morphine dependent male rats using Passive Avoidance shuttle box and Morris Water Maze tasks were investigated, respectively. Male rats received morphi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011